python version
FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding
Chen, Haorui, Li, Chengze, Li, Jia
The rapid advancement of Large Language Models (LLMs) has given rise to a novel software development paradigm known as "vibe coding," where users interact with coding agents through high-level natural language. However, existing evaluation benchmarks for code generation inadequately assess an agent's vibe coding capabilities. Existing benchmarks are misaligned, as they either require code-level specifications or focus narrowly on issue-solving, neglecting the critical scenario of feature implementation within the vibe coding paradiam. To address this gap, we propose FeatBench, a novel benchmark for vibe coding that focuses on feature implementation. Our benchmark is distinguished by several key features: 1. Pure Natural Language Prompts. Task inputs consist solely of abstract natural language descriptions, devoid of any code or structural hints. 2. A Rigorous & Evolving Data Collection Process. FeatBench is built on a multi-level filtering pipeline to ensure quality and a fully automated pipeline to evolve the benchmark, mitigating data contamination. 3. Comprehensive Test Cases. Each task includes Fail-to-Pass (F2P) and Pass-to-Pass (P2P) tests to verify correctness and prevent regressions. 4. Diverse Application Domains. The benchmark includes repositories from diverse domains to ensure it reflects real-world scenarios. We evaluate two state-of-the-art agent frameworks with four leading LLMs on FeatBench. Our evaluation reveals that feature implementation within the vibe coding paradigm is a significant challenge, with the highest success rate of only 29.94%. Our analysis also reveals a tendency for "aggressive implementation," a strategy that paradoxically leads to both critical failures and superior software design. We release FeatBench, our automated collection pipeline, and all experimental results to facilitate further community research.
adabmDCA 2.0 -- a flexible but easy-to-use package for Direct Coupling Analysis
Rosset, Lorenzo, Netti, Roberto, Muntoni, Anna Paola, Weigt, Martin, Zamponi, Francesco
In this methods article, we provide a flexible but easy-to-use implementation of Direct Coupling Analysis (DCA) based on Boltzmann machine learning, together with a tutorial on how to use it. The package \texttt{adabmDCA 2.0} is available in different programming languages (C++, Julia, Python) usable on different architectures (single-core and multi-core CPU, GPU) using a common front-end interface. In addition to several learning protocols for dense and sparse generative DCA models, it allows to directly address common downstream tasks like residue-residue contact prediction, mutational-effect prediction, scoring of sequence libraries and generation of artificial sequences for sequence design. It is readily applicable to protein and RNA sequence data.
Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs
Bartlett, Antony, Liem, Cynthia, Panichella, Annibale
Fixing Python dependency issues is a tedious and error-prone task for developers, who must manually identify and resolve environment dependencies and version constraints of third-party modules and Python interpreters. Researchers have attempted to automate this process by relying on large knowledge graphs and database lookup tables. However, these traditional approaches face limitations due to the variety of dependency error types, large sets of possible module versions, and conflicts among transitive dependencies. This study explores the potential of using large language models (LLMs) to automatically fix dependency issues in Python programs. We introduce PLLM (pronounced "plum"), a novel technique that employs retrieval-augmented generation (RAG) to help an LLM infer Python versions and required modules for a given Python file. PLLM builds a testing environment that iteratively (1) prompts the LLM for module combinations, (2) tests the suggested changes, and (3) provides feedback (error messages) to the LLM to refine the fix. This feedback cycle leverages natural language processing (NLP) to intelligently parse and interpret build error messages. We benchmark PLLM on the Gistable HG2.9K dataset, a collection of challenging single-file Python gists. We compare PLLM against two state-of-the-art automatic dependency inference approaches, namely PyEGo and ReadPyE, w.r.t. the ability to resolve dependency issues. Our results indicate that PLLM can fix more dependency issues than the two baselines, with +218 (+15.97%) more fixes over ReadPyE and +281 (+21.58%) over PyEGo. Our deeper analyses suggest that PLLM is particularly beneficial for projects with many dependencies and for specific third-party numerical and machine-learning modules. Our findings demonstrate the potential of LLM-based approaches to iteratively resolve Python dependency issues.
On the Variability of AI-based Software Systems Due to Environment Configurations
Rahman, Musfiqur, Khatoonabadi, SayedHassan, Abdellatif, Ahmad, Samaana, Haya, Shihab, Emad
Software systems are inherently complex. In addition, any ML model is, at its core, probabilistic in nature and hence, suffers from the challenge of uncertainty [2, 3, 4]. The complexity of a software system combined with the non-deterministic nature of an ML model can introduce variability - the phenomenon where a piece of software behaves differently when the development or the runtime environment changes although the internal software artifacts such as code, and input data are exactly the same. In practice it is very likely that development and deployment environments are different, hence, understanding how an ML model may behave differently after deployment compared to how it behaved in the development environment is a crucial aspect of AI-based software development. For example, an arbitrary face recognition system achieving an F1-score of, say 0.9, in the development environment does not guarantee that it will on average achieve a similar F1-score once deployed in a different environment configuration.
How to Run a ChatGPT-Like LLM on Your PC Offline
There are a number of AI players in the market right now, including ChatGPT, Google Bard, Bing AI Chat, and many more. However, all of them require you to have an internet connection to interact with the AI. What if you want to install a similar Large Language Model (LLM) on your computer and use it locally? An AI chatbot that you can use privately and without internet connectivity. Well, with the new Alpaca model released by Stanford, you can come close to that reality.
The Bitter Truth: Python 3.11, Cython, C++ Performance
Is Python finally ready for this task? This article compares various approaches to speed up Python. However, it should be clear in advance that C is still faster than Python. The question is by how much? The article is tailored for Data Scientists and persons with domain knowledge and Python experience that are interested in results gained from a simulation. The article demonstrates the current state of Python's performance using one example only. It is not a rigorous comparison. It shows what tools are available, how to measure performance gains, and what best practices are.
GitHub - aws/sagemaker-python-sdk: A library for training and deploying machine learning models on Amazon SageMaker
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have your own algorithms built into SageMaker compatible Docker containers, you can train and host models using these as well. For detailed documentation, including the API reference, see Read the Docs.
TensorFlow GPU on Mac
To enable GPU usage on Mac, TensorFlow currently only supports python versions 3.8.x Check out the below post on how to manage multiple python versions on your OS safely. Then be sure to install python 3.8.x Now, this might be trivial and you may want to use your GPU always, but believe me it's easy to mess up a python environment. Messing up your system environment might be catastrophic prompting you to completely delete python and re-do your entire setup.
Serve your first model with Scikit-Learn + Flask + Docker
One of the first steps in achieving this is to create a process to serve machine learning models to the organization. This is usually done by creating an application to run the prediction model and return the prediction, in the example in this post we are going to use a handy stack to create and serve models. We will be using Python as the base programming language, the Scikit-Learn package for building the model pipeline: preprocessing the data, training the model and saving the model into a file, the Flask package to develop a web application for the interaction between the client and the prediction model and finally Docker for containerizing the application to prepare it for deployment. In this example we are going to work with the dataset: Breast Cancer Wisconsin (Diagnostic) [1], a widely used dataset for testing machine learning models. In this dataset features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass and it was first introduced in K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23–34].
Advanced Machine Learning with Basic Excel - DataScienceCentral.com
In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods. The advanced techniques in question are math-free, innovative, efficiently process large amounts of unstructured data, and are robust and scalable. Implementations in Python, R, Julia and Perl are provided, but here we focus on an Excel version that does not even require any Excel macros, coding, plug-ins, or anything other than the most basic version of Excel. It is actually easily implemented in standard, basic SQL too, and we invite readers to work on an SQL version. In short, we offer here an Excel template for machine learning and statistical computing, and it is quite powerful for an Excel spreadsheet.